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O ' Abstract 



With the blasting increase of wireless data traffic, incumbent wireless service providers (WSPs) 



3 : 

face critical challenges in provisioning spectrum resource. Given the permission of unlicensed access 
to TV white spaces, WSPs can alleviate their burden by exploiting the concept of "capacity offload" to 
transfer part of their traffic load to unlicensed spectrum. For such use cases, a central problem is for 
WSPs to coexist with others, since all of them may access the unlicensed spectrum without coordination 
thus interfering each other. Game theory provides tools for predicting the behavior of WSPs, and we 
formulate the coexistence problem under the framework of non-cooperative games as a capacity offload 
game (COG). We show that a COG always possesses at least one pure-strategy Nash equilibrium (NE), 
, and does not have any mixed-strategy NE. The analysis provides a full characterization of the structure 

in 

t^J" i of the NEs in two-player COGs. When the game is played repeatedly and each WSP individually updates 

its strategy based on its best-response function, the resulting process forms a best-response dynamic. We 

OO ' 

establish that, for two-player COGs, alternating-move best-response dynamics always converge to an NE, 

while simultaneous-move best-response dynamics does not always converge to an NE when multiple 

NEs exist. When there are more than two players in a COG, if the network configuration satisfies 
• i-^ . 

■ certain conditions so that the resulting best-response dynamics become linear, both simultaneous-move 

jH ' 

and alternating-move best-response dynamics are guaranteed to converge to the unique NE. 

Index Terms 

best response; capacity offload; Nash equilibrium; non-cooperative game; power allocation; unli- 
censed spectrum 
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I. Introduction 

With the blasting increase of wireless data traffic from new applications such as smartphones, a 
solution for wireless service providers (WSPs) is to make use of the concept of "capacity offload" 
to transfer part of their traffic load elsewhere off the main system, to alleviate the load thus 
improving the overall system capacity. In industry, femtocell and WiFi networks are the primary 
candidates for capacity offload [QQ| 0. As the Federal Communications Commission (FCC) and 
other regulatory bodies have recently permitted unlicensed access to TV white space spectrum 
0, it becomes possible for WSPs to use dynamic spectrum access techniques to offload their 
traffic from their own licensed bands (i.e., private bands) to the publicly available unlicensed 
bands (i.e., shared bands). A natural question thus comes to us: what is the impact of the 
additional unlicensed spectrum on coexistence? A general drawback associated with unlicensed 
spectrum is the so-called tragedy of the commons [4], i.e., the spectrum may be overused by 
WSPs without admission fee and the communication may encounter excessive interference |0. 

We consider a simplified network model as depicted in Figure CD Suppose a time-division 
channel access scheme, so that in each time slot only one user equipment is active in each 
WSP's network. The WSPs utilize the additional unlicensed spectrum simultaneously, due to 
absence of coordination among different WSPs on spectrum allocation. A user equipment can 
communicate with its serving WSP in the corresponding private band and the shared band at the 
same time. Communication in the private band is free from interference of other WSPs, while 
in the shared band, a receiver will experience interferences from other WSPs' transmissions. A 
transmitter has an average power constraint and thus needs to allocate its power budget between 
its private band and the shared band. The power allocation strategies of WSPs thus lead to an 
inherent interaction among the WSPs, thus determining their behaviors. 

Game theory is a powerful tool in analyzing and predicting outcomes of interactive decision 
making processes. In this work, we use game theory to analyze the interaction among WSPs, 
and formulate such interaction into a capacity offload game (COG): WSPs are players of the 
game, power allocation schemes (i.e., how to split each WSP's power budget between private 
band and shared band) constitute the strategy space of each WSP, and achievable rates are the 
utility functions of WSPs. Due to the lack of coordination, the relationship among the WSPs is 
competitive; that is, a COG is non-cooperative and each WSP would choose a power allocation 
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strategy to attempt to maximize its achievable rate regardless of other WSPs. 

There has recently been a heightened interest in the applications of game theory in wireless 
networks; see, e.g., [6] and references therein. Works whose models bear similarities to ours, 
however, are relatively few, summarized as follows. In [7], the authors considered a network 
consisting of two interfering links for which both sources have access to a common relay which 
has access to bands orthogonal to that used by the sources. In flU, the authors considered a 
network consisting of two source-destination links and three bands, assuming that one of the 
bands is shared by the transmitters while the other two bands are private. The analysis therein 
is based on results of supermodular games, with strategy spaces defined in a way such that the 
game has strategic complementarities. Unfortunately, that approach does not apply for games 
with more than two players, as that we consider in this paper. 

Through the analysis of the COG, we arrive at a number of interesting conclusions. The COG 
always has at least one pure-strategy NE when all WSPs adopt deterministic strategies (i.e., 
pure strategies). When WSPs reach an NE, none of them would have incentive to unilaterally 
deviate from the NE since otherwise the deviation would decrease its utility 10. Even if we 
permit mixed strategies, i.e., an WSP choosing its strategy randomly according to a probability 
distribution over its strategy space, the COG does not possess any mixed- strategy NE. The NE of 
a COG is not necessarily unique, and the number of NEs depends upon the network parameters. 
As an illustration, we fully characterize the NEs for two-player COGs, for all possible network 
parameters. 

We then examine the behavior of best-response learning algorithms [10J which requires only 
local information for each WSP, to study how to reach an NE in a distributed way. In best- 
response learning algorithms, a COG is played repeatedly and in each time slot, an WSP updates 
its strategy based on its best-response function with respect to other WSPs' strategies in the 
previous time slot. WSPs can update their strategies either simultaneously or alternatingly. Our 
findings are as follows. For a two-player COG, alternating-move best-response dynamics always 
converge to an NE, regardless of the number of NEs in the game; whereas the simultaneous-move 
best-response dynamic does not always converge to an NE when multiple NEs exist, depending 
upon the initial strategies adopted. For COGs with more than two WSPs, the convergence 
property is generally difficult to analyze due to the nonlinearity in the best-response functions. 
But we find that when the network parameters are configured in a way such that each WSP's 
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best-response function is linear with respect to other WSPs' strategies, both simultaneous-move 
and alternating-move best-response dynamics are guaranteed to converge to the unique NE of 
the COG. 

The remaining part of the paper is organized as follows. Section [TT] formulates the capacity 
offload problem as a non-cooperative game and sets some basic assumptions for analysis. Section 
HH establishes the existence of pure-strategy NE and the non-existence of mixed- strategy NE for 
COGs, and characterizes the structure of the NEs in two-player COGs. Section |IV] analyzes the 
convergence properties of best-response learning algorithms for approaching NEs. Section |V] 
provides numerical results to illustrate the behaviors of learning algorithms. Finally, Section IVl! 
concludes the paper. 

II. Model and Game-theoretic Considerations 

A. System model 

An abstract model for capacity offloading may be described as follows; also see Figure [2] 
Assume that a set, % = {!,■■■ ,K}, of access points from different WSPs are deployed in a 
geographic area. Suppose that each WSP, say WSP k, occupies a private band, whose bandwidth 
is k Hertz. A shared band, whose bandwidth is B s Hertz, can be accessed and made use of 
by all WSPs. So the total available bandwidth, B Hertz, consists of all WSP's private bands and 
the shared band; that is, B = (a + Y2kexPk)B Hertz, where a = B s /B and fa = B Pyk / B, for 

k e X. 

We assume that the transmitter of WSP k has an average power constraint Pk, and that it can 
arbitrarily allocate its power on its own private band and the shared band. Denote by x k G [0, 1] 
the fraction of WSP fc's power allocated on its private band; that is, the WSP transmits at power 
Xj-Pk in its private band, and at (1 — Xk)Ph in the shared bandU WSPs deploy time-division 
channel access scheme, so that in each time slot, a WSP exclusively transmits information to a 
single user equipment. All receivers are subject to additive white Gaussian noise with zero mean 
and power spectral density n . We further assume that all the link channels are frequency-flat, 
among which \hk t k\ 2 is the channel gain of the link from WSP fc's transmitter to its corresponding 

1 It is reasonable for each WSP to use up all its power. If there is power left unallocated for an WSP, it can allocate this 
residual power on its private band to increase its achievable rate without creating extra interference to other WSPs. 
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receiver in both its private band and the shared band, and \hj >k \ is the channel gain of the link 
from WSP j's transmitter to WSP k's receiver. 

The considered frequency-flat fading model is somewhat restrictive in that the channel gains 
over the private bands and the share band are set identical. This would occur mainly in situations 
where the private bands and the shared band are not far apart in frequency. Using such a simplified 
model, however, we are able to convey our key ideas effectively without dwelling in tedious 
technicalities. Extensions to more general fading models are possible; see, e.g., j8]|. In the sequel, 
we use Cj fc to replace \hj >k \ 2 /B and c kyk to replace \h kyk \ 2 /B, to simplify notations. 

We denote the symbols transmitted by WSP k's transmitter by s Pyk (in its private band) and 
s S)k (in the shared band), with time indices suppressed. So the received baseband signals at WSP 
k's receiver can be written as y pk = h kyk s Pyk + z Pyk (in its private band) and y Syk = h kyk s Syk + 
Y.jex\{k} h 3,kS s ,j + z s ,k (in the shared band), where JEfl^l 2 ] = n^B, ~E[\z Syk \ 2 } = n aB, 
E[|s Pi fc| 2 ] = x k P k and E[|s Si fc| 2 ] = (1 — x k )P k . We consider a naive coding scheme in which 
all transmitted symbols follow Gaussian codebooks and all receivers adopt single-user decoding 
treating others' signals as noise. Hence, WSP k's achievable rate (normalized by B) is: 

(1 - X k )P k Ch k 



w fc (x) =alog 2 1 + 



in the shared band 



+(3 k log 2 ( 1 + ^^f\ (bits/Hertz), 



in the private band 

where vector x = (xi, • • • ,xk) t represents the power allocation strategies adopted by all the 
WSPs. Due to the lack of coordination among WSPs, it is reasonable to suppose that each WSP 
adopts its individual strategy to attempt to maximize its own achievable rate, regardless of other 
WSPs' rates. This situation is formally described by the COG, introduced in the following. 

B. Game-theoretic model 

We define the pure-strategy form of COG as S = (3C, (X k ) ke x, (u k ) k ex), in which % represents 
the set of players (i.e., the WSPs), X k is the set of WSP k's pure strategies, i.e., the power 
allocation ratio x k G [0,1], and the utility function u k is given by (OQ). More generally, we 
may also consider mixed strategies: each player k E X chooses its strategy x k following a 
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discrete probability distribution n k (x k ) = (vr (i), • • • , it ( N )) t E A(X k ) over a finite strategy 
space (where tx (») represents the probability that WSP k chooses strategy x k ) or following 
a probability distribution function n k (x k ) E A(Jf fc ) over a continuous strategy space, where 
Jq 7tk(xk)dxk = 1, Tc k (x k ) > 0. For mixed strategies, the utility function u k of player k is defined 
as the expectation of u k in (OQ) with respect to the probability distributions of all players' strategies. 
We can define the mixed strategic-form game for COG as S = (X, (A(X k )) k£ x, (u k ) ke %). 

The definitions of pure-strategy and mixed-strategy NEs are as follows IfTTl . 

Definition 1 (Pure-strategy Nash equilibrium): A strategy profile x* = {x\, x* K ) T is a pure- 
strategy NE for the COG, if VA; E X and Vx^. G X k , u k (x* k ,^*_ k ) > u k (x' k ,x*L k ), where the 
subscript —k represents all the players other than player k. 

Definition 2 (Mixed-strategy Nash equilibrium): A mixed- strategy profile n* is a mixed- strategy 
NE for the COG, if Wk E % and W' k E A(X k ), Uk(^l, 7rl fc ) > Mfe(7r^, 7r* fc ), where u k (ii) = 
E n [u k (x k ,^ k )} : A(X 1 ) x ... x A(X„) — )• E. 

III. Analysis of Nash Equilibria 

In this section, we establish the existence of pure-strategy NE, as well as the nonexistence 
of mixed- strategy NE, for COGs. With the aid of explicit best-response function, we fully 
characterize the structure of NEs for two-player COGs, by displaying the relationship between 
network parameters and the NEs. 

A. Existence of Pure-strategy NE 

The existence of pure-strategy NEs for COGs is an application of the following lemma. 

Lemma 1 (Rosen $12§): At least one NE exists for every concave X-player game. 

For a concave game, the joint strategy set is convex, closed and bounded, and for each player 
k E X, its utility function is continuous with respect to the joint strategy and is concave with 
respect to player k's strategy. Then we state the existence theorem as follows: 

Theorem 1 (Existence of pure-strategy NE): A pure- strategy COG, S = (3C, (X k ) k£ x, iu k ) ke %), 
always possesses at least one NE, i.e., there is at least one strategy profile x* = (x*, ■ ■ ■ ,x* K ) T 
such that u k (x* k ,x.*_ k ) > u k (x' k ,x*_ k ) holds, Wk E X, Wx' k E X k . 

Proof: By lemma [H it suffices to verify that the pure-strategy COG is a concave fT-player 
game: 
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1) The pure-strategy x is from a convex, closed and bounded set, namely X\ x • • • x X K = 
{x = (xi, • ■ ■ ,x K )\x k E [0, l],for k = 1, • •• ,#}; 

2) Player fc's utility function, Wfc(x), defined by O, is continuous in x and is concave in x k , 
since 

d 2 u k (x) 



aPhl k 

A, A/ . n. 



dx\ 



[n a + Y.je%( 1 ~ x j) P i c j,k} 2 ( n oPk + x k P k c kik 



< 0. 



B. Nonexistence of Mixed-strategy NE 

Theorem Q] guarantees that at least one pure-strategy NE exists, but it is remains unclear 
whether a mixed-strategy NE exists. In the following, we rule out this possibility. 

Theorem 2 (Nonexistence of mixed- strategy NE): For a COG, there is no mixed strategy that 
makes the players reach an NE@ 

Proof: For a given k, let us fix players — fc's strategy probability distribution as 7r* fe (x_fc) E 
A(JT_ fc ). Regarding the utility u k of player k, we have 



Uk(n k , 7T* fc ) = / / M fc (a;fc,x_ fc ) 7r fc(a;fc)7r!l fc (x_ fc )da; fc dx_/ i: 

(1 - x k )P k c k>k 



x_Jo 



alog 2 I I 
f /3 fc log 2 (1 + 



(a) 
< 



a log 2 1 + 



x k P k c kjk 
n (3 k 

J (l - x k )P k c Kk Ti k (x k )dx k 



vrfc(x fc )7T* fe (x_ fc )da; fc dx_ 



+ /3fclog 2 I 1 



n 0« + EjGDCUfc}! 1 - X j) P jCj,k 
Cx k P k C ktk 7T k (x k )dx k 



x_ 



a log 2 1 + 



(1 - E[x k ])P k c k , k 



7rl A .(x_ fc )dx_ fc 



+ Pk log 2 1 



+ E je3 C\{fc} 

E[x fc ]P fc c fc;fc 



1 X j)Pj C j,k 

7T* fc (x_ fc )dx_ fc , 



where (a) follows from Jensen's inequality. Since Wfc(x) is strictly concave in (cf. proof of 
Theorem [TJ), we achieve equality in (a) if and only if x k = E,[x k ] with probability 1, which 



2 Pure strategies are degenerated mixed strategies, and are excluded from the consideration. 
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means that player k's strategy set is deterministic. So it is optimal for player k to adopt a pure 
strategy to maximize its utility, and the same conclusion also applies to every other player. Thus 
we rule out the possibility for mixed strategies being optimal and complete the proof. ■ 
According to Theorem [2l in the following we only need to focus on pure strategies. 



C. Best-response Functions 

Since u k (x) is strictly concave in x k , player k may choose a pure strategy Xk € [0, 1] satisfying 

du k (x k ,x- k 



0. 



to maximize its utility when its opponent players adopt strategy x_ fc . Solving ©, we get 



x k = f fc (x_ 



a + (3 k 



1 



E 



j&X\{k} 



(2) 



(3) 



Pk c k,k 

Taking into account that x k is no greater than one, we obtain the best-response function BR& of 
player k as 



BR fc (x_ fc ) 



mm 



Pk 



PkCk,k 



a + Pk 

We can determine the NEs of a COG by solving the equations 

x fc = BR fc (x„ fc ), for k = 1, • • • ,K. 



(4) 



(5) 



Note that the equations may have multiple solutions, corresponding to multiple NEs, as illustrated 
in the next subsection. 



D. Characterization of NEs for Two-player COGs 

When a COG has only two players, it is convenient to describe the relationship between the 
number and behavior of NEs and network parameters. The two players' best-response functions 
are: 



X\ = BRx^) = min 
x 2 = BR 2 (xi) = min 



ft 


1 + 




a + Pi 


PlCl,l 




1 + 


P%Cl,2 


a + (5 2 


P2C 2 ,2 



X2 



1 



(6) 
(7) 
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When we draw both players' best-response functions in the same (xi, x 2 ) -coordinate diagram, 
by the definition of NE, the intersection points of the two best-response functions correspond to 
the NEs of the COG. Depending upon the network parameters, the number and locations of the 
NEs vary, as illustrated in Figures (3]|7J The behavior of NEs can be summarized in the following 
theorem. 

Theorem 3: For a two-player COG, it has 
1) a unique NE as the solution of 



ft 



x 2 



a + ft 

ft 
a + ft 



1 + — (1 - x 2 ) 



Pxcx,x 
Aci, 2 | 
P2C2,2 

when the network parameters satisfy (see Figure [3]) 



Cl,2 < P% Oi + 01 
02,2 -Pi 02 



c 2 ,i Pi a + 02 

and r~ ■ 

c i,i "1 Pi 



(8) 



2) a unique NE as (xi,x 2 ) = C^fc, lj> when (see Figure 3]) 



ci,2 > P2 a + ft and c^i Pi a + /3 2 _ 

C2,2 Pi ft C a ,! P 2 ft 



(9) 



3) a unique NE as (xi,x 2 ) = ^1, ^^), when (see Figure S]) 



ci,2 < P 2 a + ft 

C2,2 Pi ft 



, c 2 ,i Pi a + 02 
and — — > 



Cl,l P 2 ft 

4) three NEs as those listed in cases l)-3) above together, when (see Figure [5]) 



ci,2 > P 2 a + 0i 

C 2 ,2 Pi 02 



c 2 ,i Pi a + 02 
and > 



(10) 



(11) 



Cl,l P 2 01 

For this case, there are two singular subcases. If any one of the two inequalities becomes 
equality, the three NEs collapse into two, since the NE in case 1) coincides with the NE 
in case 2) or 3); see Figure [6l If both of the two inequalities become equal, there are an 
infinite number of NEs since the two lines 



X\ 



x 2 



01 



a + ft 

ft 
a + 02 



I . P2 c 2,l n \ 

1 + ~ — -(1 - X 2 ) 



PlCl,l 
PlC!, 2 | 
P 2 C2,2 



coincide for all Xi G 



§2 



; see Figure [71 
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Let us make a few comments regarding Theorem [3j Throughout its cases, we notice that the 
behavior of NEs depends upon the comparison between two sets of quantities, 

Cl,2-Pl C2 2-^2 , C2 I-P2 C l,l-Pl 

— ■ — — versus — - — , and — ■ — — versus — - — . (12) 

at + pi P2 a + p 2 Pi 

In the first comparison, C\^Pxj{a, + 0i) can be viewed as the interference strength to player 
2, when player 1 evenly distributes its power across its private band and the shared band, and 
£•2^2/ @2 can be viewed as the signal strength to player 2 when it exclusively uses its private 
band. The two quantities in the second comparison can also be interpreted correspondingly. It 
is interesting that the comparison between these seemingly unrelated quantities determines the 
behavior of NEs in the game. 

The NE in case 1) implies that both players allocate a portion of their power budgets to the 
shared band, so that they indeed coexist tolerating a certain amount of interference. According 
to ([8]), this occurs when the interference strengths to both players are weak. The NE in case 2) 
or 3), {^^1 1) or ^1, ^f^J> implies that player 2 or player 1 completely retreats from the 
shared band, while the other player evenly distributes its power across its private band and the 
shared band. This is the unique stable operating point (i.e., NE) when one player experiences 
strong interference while the other's interference is still weak. In case 4), the interference 
strengths to both players are strong, and it is interesting that then the system may reach any of 
three equilibrium operating points: players coexisting within the shared band, and either player 
completely retreating from the shared band. 

IV. Best-response Dynamics for Distributed Learning 

Having established the existence of NEs in COGs, in this section, we focus on the behavior 
of best-response dynamics for distributed learning, to examine whether the dynamics' evolution 
eventually leads all players to reach an NE. We begin with defining the best-response dynamic 
learning procedures. We then fully characterize the convergence properties of best-response 
dynamics for two-player COGs. Finally for A"-player COGs, we establish the convergence 
properties of best-response dynamics, when the network parameters are such that the resulting 
best-response functions are all linear. 
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A. Best-response dynamics 

We assume that in a best-response dynamic process, whenever a player k decides to update 
its strategy at time instant t, it possesses the knowledge of the other players' joint strategies x_/c 
at t, and the updating rule is 

x k (t + ) = BRfc(x_ fc (t)). (13) 

That is, the updated strategy Xk right after the time instant t is the one that maximizes the utility 
Uk given the strategies of the other players at t. Inspecting the best-response functions ©, we 
note that in order to update it is only necessary for WSP k's receiver to measure the level 
of aggregated interference in the shared band, Yljex\{k] Pj c j,k(^- ~ x j)- Furthermore, we may 
rewrite © as 

• IP* 
x k = mm <^ — — — 

where 

SIR, 4 PkC kJt (l-x k ) (15) 

l^jeX\{k} ^3 C 3,k{ 1 x j) 

denotes the signal-to-interference ratio (SIR) of WSP fc's receiver in the shared band. 

In the following, we consider two situations. In the first situation, all the players update their 
strategies simultaneously. We call this simultaneous-move best-response dynamic (SMBRD). In 
the second situation, the players update their strategies sequentially, in a periodic round robin 
fashion. We call this alternating-move best-response dynamic (AMBRD). 

For the two kinds of best-response dynamics, we propose the following distributed learning 
algorithms. In the algorithm description, we assume for simplicity that time is slotted and updates 
occur at the beginning of each time slot. 

Simultaneous-Move Best-Response Dynamic (SMBRD): 



1 + 



(14) 



For each player k G %: 

Step 1: At time t = 0, player k selects an initial strategy £fc(0) arbitrarily within [0, 1]; 
Step 2: At time t + 1, given the measurement of SIRfc(t), player k updates its strategy to 



Xk(t + 1) = min 



1 + 



SIRjfc(t) 
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Step 3: Increase the time slot index to t + 1 and go back to step 2 until all the players' 
strategies become stationary or the time index reaches the prescribed maximum number 
of iterations. 



Alternating-Move Best- Response Dynamic (AMBRD): 



Without loss of generality, we assume that the K players update their strategies sequentially and 
periodically, from player 1 until player K in each updating cycle. 

Step 1: At time t — 0, player 1 selects an initial strategy xi(0) arbitrarily within [0, 1]; 

Step 2: For each player k — 2, • • • , K, it takes turns to revise its strategy at time t + k — 1. 
Given the measurement of SIRfc(i + k — 1), player k updates its strategy to Xk(t + k) = 

min{4^ [l + snu(«+fc-i) ] 
Step 3: Increase the time index to t + K, and player 1 updates its strategy according to 

Xi (t + K) = min j 1 + gIRl (i+K-iy ] > 1 j • Go back to step 2 until all the players' 

strategies become stationary or the time index reaches the prescribed maximum number 

of iterations. 



B. Convergence Analysis for Two-player COGs 

In this subsection, we analyze the simple case where there are only two players in the COG. Let 
us begin with the SMBRD, whose execution exhibits a sequence of strategies of the two players. 
From the best-response functions of the two players, we see that the sequence of strategies can 
be decomposed into the following two subsequences 

Sequencefa] : (x 1 (0),x 2 (l)) ,(x 1 (2),x 2 (S)) ,■ ■ ■ , 
and Sequence[b] : (x 2 (0),x 1 (l)) ,(x 2 (2),x 1 (3)) ,■ ■ ■ , 

each of which evolves independently of the other. Sequence[a] is uniquely determined by 
player l's initial strategy x 1 (0), and Sequence[b] by x 2 (0). So we can study each sequence's 
convergence property individually. Only when these two subsequences' limits coincide, the 
SMBRD converges; otherwise the SMBRD exhibits a cycling behavior between the limits of 
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the two subsequences. In the following discussion, for notational simplicity, we abbreviate © 
as Xk = fk(x~k) = —bkX-k + a-k, where a k ,b k > for k — 1, 2, and denote the intersection of 
the two lines by (x\,x* 2 ). We then discuss the convergence property of the two sequences under 
difference network parameters. 

1) When the COG only has a unique NE: 

a) When condition: 

ci )2 P 2 a + A , c 2) i Pi a + (3 2 

< — and < — ( 16 ) 

C 2 ,2 Pi P2 Cl,l P2 Pi 

is satisfied, the unique NE is (xl,x 2 ), the solution of x k = — bf.X-.fr + a k (k = 1,2); 
see Figure |3l By symmetry we only discuss Sequencer's convergence property and 
the discussion for Sequence[b] is similar. From the conditions (fl6l) . if Xi(i) > f 2 1 (l), 
we have 

x x {t + 2) -x\ =b 1 b 2 [x 1 (t) -x\], (17) 

in which < bib 2 < 1. So the updating process of x\{t) will converge to x\. If 

zi(*) < /a"^ 1 )' then we have x ^ + 1) = 1, x 1 {t + 2) = > / 2 -1 (l), which 

again enters the regime of (fTTT) and thus will converge to a;* and x 2 (t) will 

converge to BR 2 (x*) = x 2 . The analysis of Sequence[b] yields the same result, and 
we therefore see that under (fl6l) . both Sequence[a] and Sequence[b] converge to the 
unique NE. 

b) when condition: 

ci, 2 . P2 a + /3i c 2 ,i Pi a + (3 2 

C 2i 2 -Pi P 2 Ci,i P 2 A 

is satisfied (cf. Figure S]), similar to the discussion above, we can verify that both 
Sequence[a] and Sequence[b] converge to the NE (/i(l), 1) given any initial strategies 
(xi(0), £2(0)). For the other symmetric case, both sequences will converge to the NE 

(1,/ 2 (1)). 

Therefore, when the COG has only one NE, SMBRD will always converge to the NE. 

2) When the COG has two NEs, by symmetry we only discuss the case where conditions 

C12 P 2 a + (3 1 c 21 P 1 a + (3 2 

C 2 , 2 r\ p 2 Ci 5 i r 2 Pi 

are satisfied (cf. Figure©. Note that with the conditions, bib 2 > 1. Similar to the discussion 
above, we see that when 2i(0) < 1, Sequence[a] converges to (/i(l), 1), but when x%(0) = 



August 9, 2011 



DRAFT 



13 



1, Sequence[a] converges to (1,/ 2 (1)). For Sequence[b], when x 2 (0) < /2(1), we have 
a?i(l) = 1 and x 2 (2) = / 2 (1), and then Sequence[b] converges to (1, f 2 (l)); when x 2 (0) > 
/ 2 (1), due to 6x& 2 > 1 and 

x 2 (t + 2)-x* 2 = b 1 b 2 [x 2 (t)-x*}, (20) 

Sequence[b] converges to (/i(l), 1). Therefore, under the conditions (fT9l) , when the initial 
strategies x(0) fall within region I in Figure [6l SMBRD converges to the NE (/i(l), 1), 
and when the initial strategies x(0) fall within region II, SMBRD ends up with cycling 
between strategies (/i(l), / 2 (1)) and (1,1). 

3) When the COG has three NEs, the conditions 

ci 2 P 2 a + (3i c 2) i Pi a + (3 2 

— > 7^— r~ and — > ~5 a W 

C2,2 A P2 Ci,i r 2 pi 

are satisfied (cf. Figure [5]). We trace the evolution of the sequences and get the following 
result. For Sequence[a], when the initial strategy satisfies x±(0) < x\, it converges to the 
NE (/i(l),l), otherwise it converges to another NE (1,/ 2 (1)). For Sequence[b], when 
the initial strategy satisfies x 2 (0) < x 2 , it converges to the NE (1,/ 2 (1)), otherwise it 
converges to another NE (/i(l), 1). Therefore, under the conditions ([2T]) . when the initial 
strategies x(0) fall within region I in Figure [51 SMBRD converges to the NE (/i(l), 1), 
when the initial strategies x(0) fall within region II or III, SMBRD ends up with cycling 
between strategies (/i(l), /2(1)) and (1, 1), and when the initial strategies x(0) fall within 
region IV, SMBRD converges to the NE (1, / 2 (1)). 

4) When the COG has an infinite number of NEs, the conditions 

ci.2 P 2 a + f3 1 c 2 ,i Pi a + (3 2 

— — = and = (22) 

C 2 ,2 Pi P2 ci,i P 2 Pi 

are satisfied (cf. Figure [7]). For Sequence[a], when the initial strategy satisfies xi(0) < 
/i(l), it converges to the NE (/i(l), 1), otherwise it converges to the NE (xi(0), f 2 (xi(0))); 
for Sequence[b], when the initial strategy satisfies s 2 (0) < / 2 (1), it converges to the NE 
(A (^2(0)), x 2 (0)), otherwise it converges to NE (1, /i(l)). Therefore, under the conditions 
((22]) . when the initial strategies x(0) fall within region I in Figure |7l SMBRD ends up 
with cycling between strategies (/i(l), rr 2 (0)) and (fi(x 2 (0)), 1), when x(0) falls within 
region II, SMBRD ends up with cycling between strategies (fi(x 2 (0)), f 2 (xi(0))) and 
(xi(0) , x 2 (0)) , when x(0) falls within region III, SMBRD ends up with cycling between 
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strategies (f 1 (l),f 2 (l)) and (1,1), and when x(0) falls within region IV, SMBRD ends 
up with cycling between strategies (xi(0), / 2 (1)) and (1, / 2 (xi(0))). 

Summarizing the discussion above for various conditions, we see that both Sequence[a] and 
Sequence[b], which are respectively determined by the initial strategies £i(0) and x 2 (0), converge 
to limiting strategies. However, under several conditions, the two sequences do not converge to 
the same limiting strategies, and then SMBRD does not converge. On the other hand, because 
an AMBRD corresponds to exactly one of the two sequences, the convergence of AMBRD is 
guaranteed. So for two-player COGs we obtain the following theorem regarding the convergence 
property of best-response dynamics. 

Theorem 4 (Convergence of two -player dynamics): For a two-player COG, SMBRD is guar- 
anteed to converge to an NE starting from an arbitrary x(0) if and only if the COG has only 
one NE; in contrast, AMBRD always converges to an NE regardless of the number of NEs in 
the COG. 

The fact that AMBRD always converges for any initial joint strategies is desirable from an 
engineering perspective. It guarantees that a distributed network protocol designed based on 
AMBRD does terminate after a sufficient number of updates, avoiding the so-called "ping-pong 
effect". 



C. Convergence Analysis for K -player COGs with Linear Best-responses 

The analysis of the convergence property, when it comes to more general case in which a 
COG consists of K > 2 players, becomes difficult due to the lack of convenient properties 
such as potential games' or supermodular games f In the following, we analyze the convergence 
property of /^-player COGs which admit linear best-response functions. 

The best-response function © implies that player k will allocate all its power to its private 
band when the aggregated interference from other players in the shared band exceeds a threshold. 
When the interference is not too strong, the best-response functions become linear without 

3 The COG is not an exact potential game except for very special choices of network parameters, since in general we have 
^ dx-dx ■ ' ' ^ 3 1131 . and we were unsuccessful in verifying whether it is an ordinal potential game. 



4 



Only for two-player COGs it is possible to convert the COGs into supermodular games |8 |, and in general COGs are not 

ft 2 

supermodular since Sx .g x . > does not always hold for all Xi, Xj. 



August 9, 2011 



DRAFT 



15 



saturation. This occurs if 



a + fa 



1 + 



< 1 (23) 



PkCk,k 

holds for each player k G %. To ensure that the best-response functions are linear given any 
initial strategies throughout the execution of best-response dynamics, we need ([23]) to hold for 
any x fc G [0, 1], k G X. This consideration hence leads to the conditions 

Eot 
P jCj , k < —P k c k , k , Vk G X. (24) 

jex\{h} Pk 
We first establish that under conditions (124)) a COG has a unique NE. 
Theorem 5: For a i^-player COG satisfying (|24j) . it has a unique NE. 

Proof: Under the assumption of (|24l) . all the best-response functions are linear, and by 
rearranging terms we see that the NEs of the COG should satisfy 

x* = Ax* + b, (25) 

where A is a K x K matrix in which A(k,j) = f° r 3 k an d <A(A;, k) = 0, and 

b(fc) = >: -.,- /v " , for k = 1, . . . , K. From (El, we have 



< ^ 1 " p c fc 

a 

= -— «-<l, (26) 
a + Pfc 

for all A; = 1, . . . , K. Hence from Gershgorin's circle theorem (see, e.g., Ifl4l ). the bound (|26|) 
implies that the maximum eigenvalue of matrix A satisfies |A max | < 1. Hence the matrix I — A 
is nonsingular, so that the solution of x* = Ax* + b is unique, given by x* = (I — A)~ l h with 
x* G (0, l) K due to Theorem dJ ■ 
The convergence of SMBRD is a direct convergence of Theorem [5] 

Theorem 6: For a f^-player COG satisfying ((241 . SMBRD is guaranteed to converge to the 
unique NE. 

Proof: The updating process of SMBRD can be written as the following iteration, 

x{t + 1) = Ax{t) + b. (27) 
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From the proof of Theorem [5J we see that the iteration (1271 ) is globally asymptotically stable 
since all eigenvalues of matrix A satisfy |A&| < 1. So we conclude that under (l24l) . SMBRD is 
guaranteed to converge to the unique NE given by x* = (I — A) _1 b. ■ 

Establishing the convergence of AMBRD is similar while a little more involved, as provided 
by the proof of the following theorem. 

Theorem 7: For a X-player COG satisfying (l24l) . AMBRD is guaranteed to converge to the 
unique NE. 

Proof: Under the assumption of (l24l) . when player k updates its strategy, the updating 
process of AMBRD can be written as 

x(t + 1) = A k x(t) + b fe , (28) 

where A k is a K x K unit diagonal matrix, except that its A;-th row is replaced by the A;-th row 
of the matrix A. The elements of the length-fC vector b fc are all zero except that its A;-th element 
is the A;-th element of b. So if we view a full updating cycle from player 1 to player if as a 
whole, the updating iteration is like 

i 

x((i + l)K) = Yl A k x{iK) + b, z = 0,l,..., (29) 

k=K 

where 

2 3 

b = Y[ A k b x + Y[ A k b 2 + ... + A K b K ^ + h K . (30) 

k=K k=K 

So in order to prove that the AMBRD converges, it suffices to establish that all the eigenvalues 
of nit=A: ^k satisfy \\ k \ < 1. For this, we again utilize Gershgorin's circle theorem, showing 
that the row norm of rifc=i<: A k is smaller than one. 

Denote the /c-th row elements of A k by [a k ^,a kt2 , ■ ■ ■ , a^jf], in which a k :k = and a k j = 
A(k,j) = ~~^^ pQ h k k f° r 3 7^ ^- Let us trace the calculation of njUir ^-k to check that all 
of its absolute row sums are smaller than one. For this, we show by induction the following 
claim: each of the first I rows of A® = Yl k =i Ak has its absolute sum smaller than one, for 
1 = 1,. ..,K. 

For / = 1, the claim apparently holds. For I = 2, A^ = A 2 A\. Its first row remains 
ai >2 , • • • , (1x,k\i whose absolute row sum is smaller than one, by the assumption of (1241) . Its 
second row is 

[02,101,15 (02,101,2 + 02,2), (02,101,3 + 02,3), (02,101^ + CL2,Kj] , 
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whose absolute row sum is 

A K K 

|a.2,i a i,i| + ^ l a 2,i a i,i + a 2jl — | Q 2,il / ] + 



a 2 , 



J'=2 



3=2 



A' 



K 



< a 2 ,i 



a 2, 



J^Kj| < 1. 



i=2 

So the claim holds for / = 2. Now assume that the claim holds up till /, and examine A" +1 \ 
For notational simplicity we denote the z-th row elements of A® by [af\, af\, . . . , af' K ], Since 
^(i+i) = Ai + iA"\ we see that its first / rows remain those of A®, thus satisfying their absolute 
row sums smaller than one by assumption. For its (I + l)-th row, the first I elements are 



a i+ij — a i+i,i a i'j + a-i+i^at'j + • • • + a/+i,/ a z,j> 3 ' — 1) • • • j h 
and the last K — I elements are 

(i+l) _ (0 i (0 , (0 i ■ _ ; , i 

a l+i,j ~ a l+l,l a l,j + a l+l,2 a 2,j + • • • + a l+i,l a ij + a l+i,ji J — 1 ~r 1, • 



,(0 



.(0 



,(<> 



K. 



So we have 



t KS 



A' 



A 



(1+1) 



t=i i=i i=i+i 

J A 

< i a *+i,ii + i a /+iji < i) 

i=i j=H-i 



where the first inequality is from + < \x\ + \y\, the second inequality is from the assumption 
of induction for A®, and the third inequality is from the assumption for A t+i . As we let I increase 
from 1 to K—l, we establish the claim that each of the rows of A^ = Yl k=K A k has its absolute 
sum smaller than one, and thus Gershgorin's circle theorem guarantees that all the eigenvalues 
of Ul=K A k satisfy \X k \ < 1. 

Consequently, the iteration (|29|) has a unique fixed point as 



(31) 



k=K 

which, from the nature of AMBRD, is a NE of the underlying COG. On the other hand, Theorem 
|5] indicates that the NE is unique under (l24l) . Therefore we see that the fixed point in (|3TT) has 
to coincide with the unique NE of the COG given in Theorem [51 i.e., x* = (I — A)~ l h. ■ 
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V. Numerical Results 



In this section, we perform some numerical simulations to illustrate the analysis in Section 
First, We fix the spectrum allocation as a = 0.5, 0k = 0.5/ K, k = 1, . . . , K. The average 
power budgets for all the WSPs are identical as Pk = 1, and the power spectral density of white 
Gaussian noise is n = 10~ 2 . We use Monte Carlo simulation, in which the initial joint strategies 
are uniformly distributed in the entire product strategy space, to verify the relationship between 
the network parameters and the best-response dynamics' convergence property. In simulation, 
we view an iteration as converged when either the condition 



m&x\\xk(t + 1) — Xk(t)\\ < e — 10 



is met for SMBRD, or the condition 



m&x{\x k (t + j) - x k (t + j - K)\} < e 



lO" 2 , Vj 



K 



(32) 



(33) 



is met for AMBRD. The maximum number of updates in an iteration is set as 100. 

When a COG has only two players, the convergence property of SMBRD and AMBRD by sim- 
ulations is depicted in Figure[8l We consider four sets of network parameters: (c li2 /c 2) 2, c 2 ,i/ci j i) = 
(0.4, 0.6), (3, 4), (3.5, 4), (3, 3), which correspond to four different cases: a COG having one NE, 
two NEs, three NEs and an infinite number of NEs. We use empirical cumulative distribution 
functions to characterize each best-response dynamic's convergence property. From Figure [8l we 
verify that AMBRD always converges to an NE for all cases, and that SMBRD only converges 
when a COG has a unique NE. Nevertheless, when a COG has a unique NE, SMBRD converges 
more quickly than AMBRD. 

When a COG has four players, we perform a corresponding simulation, with the convergence 
property depicted in Figured We consider three different interference matrices: 

1 0.6 1.4 1.6 
1.4 1 0.9 1.4 
2.3 1.4 1 2.0 
0.9 0.7 1.4 1 

They respectively correspond to weak interference (conditions (1241) satisfied for all players), 
medium interference (conditions (|24l) satisfied for all but one players), and strong interference 
(conditions (124)) unsatisfied for all players). From the simulation results, we verify the validity 
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of the analysis in Section IIV-CI for the case of weak interference. Furthermore, we observe that 
SMBRD and AMBRD may also converge to an NE even when the best-response functions are 
nonlinear with saturation, although our analysis in Section [IV] is not able to ensure so. When the 
interference is weak, SMBRD converges more quickly than AMBRD; whereas as the interference 
becomes strong, AMBRD converges more quickly than SMBRD. 

VI. Conclusion 

In this paper, motivated by the emerging use case of capacity offload, we considered the 
interference management problem in which different WSPs allocate their transmission power 
resources between their own private bands and a shared band which is simultaneously available 
to all of the WSPs. Taking into account the non-cooperate relationship among the WSPs, we 
formulated the problem into a non-cooperative game and analyzed its properties. We further 
proposed two distributed learning dynamics for each WSP to individually learn from its local 
measurement to reach an NE, and analyzed the convergence properties of the dynamics. A 
number of topics may be explored for future research, including establishing the convergence 
properties for general K-user COGs without linear best-responses, cooperative game-theoretic 
formulations, and design of effective mechanisms for improved overall utilities for WSPs and 
even spectrum allocators. 
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(a) System model 
Fig. 1. An example of typical capacity offload scenarios 
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(a) Transmission in private bands 
Fig. 2. Mathematical model of capacity offload 



(b) Transmission in the shared band 
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Fig. 3. Case 1) of Theorem [5] The network parameters satisfy / 2 (1) < /i(l) and f 1 < /2(1), and thus the two 
best-response functions intersect at only one point as indicated by (x\,x%), corresponding to the unique NE in the COG. 




Fig. 4. Cases 2) and 3) of Theorem [5] Here we depict the situation for case 2) only, and that for case 3) is similar. The 
network parameters satisfy /^(l) > ,fi(l) and < /2(f). an d thus the two best-response functions intersect only at 



(/i(l) 
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Fig. 5. Case 4) of Theorem [3] The network parameters satisfy f 2 1 (1) > /i(l) and f 1 > /2(1), and thus the two 
best-response functions intersect at three points, as indicated in the figure. 




Fig. 6. The special subcase with two NEs of case 4). The network parameters satisfy f 2 1 (1) > /i(l) and f 1 1 (1) = /2(1), 
and thus the intersecting point (a^,^) in Figure |5] coincides with (l,/ 2 (l) = ^ffe)- The other possibility / 2 -1 (l) = /i(l) 
and > /2(1) is similar and thus omitted for conciseness. 
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Fig. 7. The special subcase with an infinite number of NEs of case 4). The network parameters satisfy / 2 _1 (1) = /i(l) and 
/j~ (1) = and thus the two slope segments of the best-response functions completely coincide, leading to an infinite 

number of NEs. 
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Fig. 8. Convergence property of two-player COGs 
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AMBRD VS. SMBRD (K-player) 
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Fig. 9. Convergence property of four-player COGs 
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